Enter the directory of the maca folder on your drive and the name of the tissue you want to analyze.
tissue_of_interest = "Marrow"
Warning message:
package ‘Seurat’ was built under R version 3.4.3
library(here)
here() starts at /Users/olgabot/Documents/GitHub/tabula-muris
source(here("00_data_ingest", "02_tissue_analysis_rmd", "boilerplate.R"))
package ‘dplyr’ was built under R version 3.4.2
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
package ‘tidyverse’ was built under R version 3.4.2── Attaching packages ──────────────────────────────────────────────── tidyverse 1.2.1 ──
✔ tibble 1.3.4 ✔ purrr 0.2.4
✔ tidyr 0.7.2 ✔ stringr 1.2.0
✔ readr 1.1.1 ✔ forcats 0.2.0
package ‘tidyr’ was built under R version 3.4.2package ‘purrr’ was built under R version 3.4.2── Conflicts ─────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::expand() masks Matrix::expand()
✖ dplyr::filter() masks stats::filter()
✖ cowplot::ggsave() masks ggplot2::ggsave()
✖ dplyr::lag() masks stats::lag()
tiss = load_tissue_droplet(tissue_of_interest)
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
[1] "Scaling data matrix"
|
| | 0%
|
|=== | 4%
|
|======= | 8%
|
|========== | 12%
|
|============= | 17%
|
|================ | 21%
|
|==================== | 25%
|
|======================= | 29%
|
|========================== | 33%
|
|============================== | 38%
|
|================================= | 42%
|
|==================================== | 46%
|
|======================================== | 50%
|
|=========================================== | 54%
|
|============================================== | 58%
|
|================================================= | 62%
|
|===================================================== | 67%
|
|======================================================== | 71%
|
|=========================================================== | 75%
|
|=============================================================== | 79%
|
|================================================================== | 83%
|
|===================================================================== | 88%
|
|======================================================================== | 92%
|
|============================================================================ | 96%
|
|===============================================================================| 100%
Calculating gene means
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating gene variance to mean ratios
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Visualize top genes in principal components
Later on (in FindClusters and TSNE) you will pick a number of principal components to use. This has the effect of keeping the major directions of variation in the data and, ideally, supressing noise. There is no correct answer to the number to use, but a decent rule of thumb is to go until the plot plateaus.
PCElbowPlot(object = tiss)
Choose the number of principal components to use.
# Set number of principal components.
n.pcs = 10
The clustering is performed based on a nearest neighbors graph. Cells that have similar expression will be joined together. The Louvain algorithm looks for groups of cells with high modularity–more connections within the group than between groups. The resolution parameter determines the scale…higher resolution will give more clusters, lower resolution will give fewer.
For the top-level clustering, aim to under-cluster instead of over-cluster. It will be easy to subset groups and further analyze them below.
# Set resolution
res.used <- 0.5
tiss <- FindClusters(object = tiss, reduction.type = "pca", dims.use = 1:n.pcs,
resolution = res.used, print.output = 0, save.SNN = TRUE)
To visualize
# If cells are too spread out, you can raise the perplexity. If you have few cells, try a lower perplexity (but never less than 10).
tiss <- RunTSNE(object = tiss, dims.use = 1:n.pcs, seed.use = 10, perplexity=100, dim.embed = 2)
# note that you can set do.label=T to help label individual clusters
TSNEPlot(object = tiss, do.label = T)
Compare to previous annotations
filename = here('00_data_ingest', '03_tissue_annotation_csv',
paste0(tissue_of_interest, "_droplet_annotation.csv"))
previous_annotation = read_csv(filename)
Missing column names filled in: 'X1' [1]Parsed with column specification:
cols(
X1 = col_character(),
channel = col_character(),
cell_ontology_class = col_character(),
cell_ontology_id = col_character()
)
tiss@meta.data[, 'previous_annotation'] <- "NA"
tiss@meta.data[as.character(previous_annotation$X1), 'previous_annotation'] <- as.character(previous_annotation$cell_ontology_class)
TSNEPlot(object = tiss, do.return = TRUE, group.by = "previous_annotation")
table(tiss@meta.data[, "previous_annotation"])
basophil early pro-B cell erythroblast
61 64 155
Fraction A pre-pro B cell granulocyte granulocytopoietic cell
38 732 371
hematopoietic precursor cell immature B cell late pro-B cell
397 118 264
macrophage monocyte proerythroblast
223 528 265
promonocyte T cell
274 162
table(tiss@meta.data[, "previous_annotation"], tiss@ident)
0 1 2 3 4 5 6 7 8 9 10 11 12 13
basophil 0 0 0 0 0 0 0 0 0 0 0 0 0 61
early pro-B cell 0 0 0 0 0 0 0 0 0 0 0 0 64 0
erythroblast 0 0 1 0 0 0 0 0 0 0 154 0 0 0
Fraction A pre-pro B cell 0 0 37 1 0 0 0 0 0 0 0 0 0 0
granulocyte 0 432 0 6 0 270 0 0 0 0 1 23 0 0
granulocytopoietic cell 0 0 0 368 3 0 0 0 0 0 0 0 0 0
hematopoietic precursor cell 0 0 70 2 324 0 1 0 0 0 0 0 0 0
immature B cell 0 0 0 0 3 0 0 0 1 0 0 113 1 0
late pro-B cell 0 0 0 0 0 0 264 0 0 0 0 0 0 0
macrophage 0 0 1 1 0 0 0 0 221 0 0 0 0 0
monocyte 524 0 3 0 0 0 0 0 1 0 0 0 0 0
proerythroblast 0 0 0 0 0 0 0 265 0 0 0 0 0 0
promonocyte 1 0 273 0 0 0 0 0 0 0 0 0 0 0
T cell 0 0 0 0 0 0 0 0 0 162 0 0 0 0
Check expression of genes of interset. Found inconsistensies in gene names. Mme (Cd10), Ly6g6c, Ly6g6e, and Iga2b though present in plate data was not found here and gave an error.